Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

نویسنده

Ilya Volnyansky

چکیده

In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces (Ωd) from which we pick datasets Xd in an i.i.d. fashion. We call the subscript d the dimension of the space Ωd (e.g. for R d the dimension is just the usual one) and we allow the size of the dataset n = nd to be such that d is superlogarithmic but subpolynomial in n. We study the asymptotic performance of pivot-based indexing schemes where the number of pivots is o(n/d). We pick the relatively simple cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the spaces Ωd exhibit the (fairly common) concentration of measure phenomenon the performance of similarity search using such indexes is asymptotically linear in n. That is for large enough d the difference between using such an index and performing a search without an index at all is negligeable. Thus we confirm the curse of dimensionality in this setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overcoming the Curse of Dimensionality ?

We study the behavior of pivot-based algorithms for similarity searching in metric spaces. We show that they are eeective tools for intrinsically high-dimensional spaces, and that their performance is basically dependent on the number of pivots used and the precision used to store the distances. In this paper we give a simple yet eeective recipe for practitioners seeking for a black-box method ...

متن کامل

Fractal Compression Using the Discrete Karhunen-Loeve Transform

Fractal coding of images is a quite recent and eecient method whose major drawback is the very slow compression phase, due to a time-consuming similarity search between image blocks. A general acceleration method based on feature vectors is described, of which we can nd many instances in the litterature. This general method is then optimized using the well-known Karhunen-Loeve expansion, allowi...

متن کامل

Near neighbor searching with K nearest references

Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no index outperforms an exhaustive scan of the da...

متن کامل

Physical Database Design for Efficient Time-Series Similarity Search

Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first tw...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/0905.2141 شماره

صفحات -

تاریخ انتشار 2009

Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

نویسنده

چکیده

منابع مشابه

Overcoming the Curse of Dimensionality ?

Fractal Compression Using the Discrete Karhunen-Loeve Transform

Near neighbor searching with K nearest references

Physical Database Design for Efficient Time-Series Similarity Search

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

عنوان ژورنال:

اشتراک گذاری